feat: Implement Model Rewrite and Traffic Splitting Logic #1820

zetxqx · 2025-11-05T16:21:08Z

What type of PR is this?

/kind feature

What this PR does / why we need it:

This pull request introduces the core logic for model rewriting and weighted traffic
splitting within the request control director. It also includes the reconciler logic for InferenceModelRewrite resources.

Key changes:

pkg/epp/requestcontrol:
- Adds the applyWeightedModelRewrite function to handle model rewriting based on
  InferenceModelRewrite rules.
- Implements weighted selection of target models for traffic splitting.
- Ensures that the oldest InferenceModelRewrite resource is respected in case of
  duplicate rules
pkg/epp/controller:
- Implements the read-only reconciler logic for InferenceModelRewrite resources

Which issue(s) this PR fixes:

Fixes partially #1811

Does this PR introduce a user-facing change?:

Users can now configure `InferenceModelRewrite` resources to automatically redirect 
incoming model requests to different target models. This feature also supports weighted 
traffic splitting, allowing you to distribute requests across multiple target models based
 on defined percentages.

netlify · 2025-11-05T16:21:14Z

✅ Deploy Preview for gateway-api-inference-extension ready!

Name	Link
🔨 Latest commit	`dc0cca4`
🔍 Latest deploy log	https://app.netlify.com/projects/gateway-api-inference-extension/deploys/691d1ab4c081f80008f90267
😎 Deploy Preview	https://deploy-preview-1820--gateway-api-inference-extension.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

k8s-ci-robot · 2025-11-05T16:21:15Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: zetxqx
Once this PR has been reviewed and has the lgtm label, please assign nirrozenbaum for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

zetxqx · 2025-11-05T18:01:14Z

/retest

zetxqx · 2025-11-18T01:27:56Z

@ahg-g @kfswain rebased and should be ready for review now! thanks in advance!

nirrozenbaum · 2025-11-18T11:29:22Z

pkg/epp/datastore/datastore.go

 	// key: InferenceObjective name, value: *InferenceObjective
 	objectives map[string]*v1alpha2.InferenceObjective
+	// key: types.NamespacedName, value: *v1alpha2.InferenceModelRewrite
+	rewrites map[types.NamespacedName]*v1alpha2.InferenceModelRewrite


Ideally I don't want to fetch ALL Rewrite objects on every request (assuming there will be objects that are model specific).
I think this should be reflected here in the way we store the objects.
keeping NamespacedName as the key in the name doesn't give us any "smart" mapping.
I think we should map from model_name -> rewrite object, while keeping empty model name as match for all requests. so on every request we call datastore to get the relevant rewrite objects and get back the:

specific rewrite objects for the requested model; and

the general rewrite objects that match all requests.

Good call, working on some patches to improve it will let you know when it's ready to review.

Added a new ModelRewriteStore for the modelRewrite, and making the rewrite request fetching efficiently. O(1). Adding ModelRewrite may not be very efficient but that operation should not happen very often.

Notably I changed the API rule conflict precedence rule , now:

we'll always consider Exact match.

if exact match not matching anything or multiple rules have exact match, we then compare the createTimestamp, and older rule wins.

zetxqx · 2025-11-19T01:14:09Z

apix/v1alpha2/inferencemodelrewrite_types.go

+	// Across all rules specified on applicable rewrites, precedence MUST be
+	// given to the match having an "Exact" model match over a generic match
+	// (a rule with an empty `matches` array).
+	//
+	// If ties still exist across multiple InferenceModelRewrite resources (e.g.
+	// two rewrites both have an exact match for the same model), matching
+	// precedence MUST be determined by the oldest resource based on
+	// creation timestamp.
+	//
+	// If ties still exist within a single InferenceModelRewrite resource, the
+	// FIRST matching rule (in list order) is used.
 	// +required


@nirrozenbaum @ahg-g @kfswain

I've updated the precedence rules for conflicting matches to better align with the HTTPRoute specification in the Kubernetes Gateway API. https://github.com/kubernetes-sigs/gateway-api/blob/f24f3a61f398c65ab629da1843cb65fd5ec9419f/apis/v1/httproute_types.go#L148-L209

The new precedence order is:

More specific wins: An Exact match always takes precedence over an All match (where the matches array is empty).

Tie-Breaker (Oldest Rule): If the specificity of the rules is the same (a tie), the rule that was created or deployed first (the older rule) wins.

This approach is both more intuitive and simplifies the implementation of efficient RewriteRule fetching per request. Specifically, when we find an exact match, we no longer need to compare it against less specific, generic rules.

zetxqx · 2025-11-19T01:33:03Z

@ahg-g @nirrozenbaum @kfswain

This is ready for review now. I didn't split things up because I feel most of part is relevant but I'm open to split it up if this is too large for review.

The main changes are:

Modify the API: change the conflict precedence rule a bit, more specific match wins then older wins.
Add a separate modelrewrite datastore for dealing with modelrewrite memory store.
Wired up reconciler logic using the modelrewrite datastore.
Wired up director logic using the modelrewrite datastore.

k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Nov 5, 2025

k8s-ci-robot requested review from elevran and robscott November 5, 2025 16:21

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Nov 5, 2025

zetxqx mentioned this pull request Nov 5, 2025

feat(api): Introduce InferenceModelRewrite API #1816

Merged

zetxqx force-pushed the modelrerwiteimpl branch from 5302959 to 2c27884 Compare November 5, 2025 17:49

zetxqx force-pushed the modelrerwiteimpl branch from 2c27884 to 48f4afa Compare November 5, 2025 18:16

infModelRewrite reconciler logic.

a3b4528

zetxqx force-pushed the modelrerwiteimpl branch from 48f4afa to 8485ba6 Compare November 18, 2025 01:23

k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Nov 18, 2025

zetxqx force-pushed the modelrerwiteimpl branch from 8485ba6 to ae26313 Compare November 18, 2025 01:32

implments model rewrite and traffic splitting.

3e9939f

zetxqx force-pushed the modelrerwiteimpl branch from ae26313 to 3e9939f Compare November 18, 2025 04:23

nirrozenbaum reviewed Nov 18, 2025

View reviewed changes

k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Nov 19, 2025

zetxqx force-pushed the modelrerwiteimpl branch from 29119d4 to bf1ddce Compare November 19, 2025 01:07

zetxqx commented Nov 19, 2025

View reviewed changes

more efficient rewrite fetching per request.

dc0cca4

zetxqx force-pushed the modelrerwiteimpl branch from bf1ddce to dc0cca4 Compare November 19, 2025 01:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Implement Model Rewrite and Traffic Splitting Logic #1820

feat: Implement Model Rewrite and Traffic Splitting Logic #1820

zetxqx commented Nov 5, 2025 •

edited

Loading

Uh oh!

netlify bot commented Nov 5, 2025 •

edited

Loading

Uh oh!

k8s-ci-robot commented Nov 5, 2025

Uh oh!

zetxqx commented Nov 5, 2025

Uh oh!

zetxqx commented Nov 18, 2025

Uh oh!

nirrozenbaum Nov 18, 2025

Uh oh!

zetxqx Nov 18, 2025

Uh oh!

zetxqx Nov 19, 2025

Uh oh!

zetxqx Nov 19, 2025

Uh oh!

zetxqx commented Nov 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat: Implement Model Rewrite and Traffic Splitting Logic #1820

Are you sure you want to change the base?

feat: Implement Model Rewrite and Traffic Splitting Logic #1820

Conversation

zetxqx commented Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

netlify bot commented Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for gateway-api-inference-extension ready!

Uh oh!

k8s-ci-robot commented Nov 5, 2025

Uh oh!

zetxqx commented Nov 5, 2025

Uh oh!

zetxqx commented Nov 18, 2025

Uh oh!

nirrozenbaum Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

zetxqx Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

zetxqx Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

zetxqx Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

zetxqx commented Nov 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zetxqx commented Nov 5, 2025 •

edited

Loading

netlify bot commented Nov 5, 2025 •

edited

Loading